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An  unknown  number,  N  ,  of  errors  exist  in  a  certain  product,  for  ex¬ 
ample,  defects  in  a  production  lot,  errors  in  a  manuscript,  or  bugs  in  a 
computer  program.  1  inspectors  with  possibly  different  competencies  are 
to  be  put  to  work  to  find  the  errors.  How  should  the  inspection  be  organ¬ 
ized,  and  what  is  a  good  estimate  of  the  undetected  errors  (or  of  N)? 

This  problem  is  similar  to  the  capture-recapture  sampling  problem  of 
population  biology,  assuming  a  closed  population  and  a  parallel  search  ef¬ 
fort,  for  which  many  classical  results  are  available..  For  example,  in  the 
case  1  *  2  ,  the  Petersen  method  estimates  N  as: 


H  -  n(1).n(2J  ^ 
n12 


where  n(i)  is  the  total  number  of  errors  found  by  inspector  i  ,  (i  *  1,2,) 
and  n^  is  the  number  of  defects  found  by  both  inspectors.  A  correspond¬ 
ing  maximum-likelihood  estimate  of  N  in  the  general  case  is  due  to  Chapman 
and  Oarroch,  and  must  be  solved  recursively  (see  Seber  (1982)  for  a  summary 
of  animal  census  methods) . 

^ Apart  from  an  elementary  analysis  of  the  1  •  2  case  by. Haskell  and 
George  (197 2)  and  some  sequential  sampling  plans  by  Freeman  (1973VV  the 
only  Bayesian  approach  to  this  problem  appears  to  be  by  Castledine  '^1981) , 
who  obtains  rather  complicated  results  appropriate  to  the  population  biology 
model. 

-inr-ow  paper,  we  develops^the  model  in  a  manner  more  related  to  error 
detection  problems. by  first  assuming  that  N  is  Poisson  with  parameter  A  , 
and  the  detection  3Tf  defects  follows  a  multinomial  law,  with  independent 
detection  probabilities,  p^  (1  ■  1,2,  I)  .  The  maximum  likelihood 

estimator  of  A  has  \the  same  form  as  the  Chapman-Darroch  estimator,  and  a 
similar  result  obtaink.  for  Q  *  n  (1  -  p^)  ,  the  probability  that  a  given 

error  is  overlooked  during  parallel  search. 

Next,  we  analyze  the  problem  in  which  A  and  the  are  random 

quantities,  by  assuming  that  they  are  Gamma-  and  Beta-distributed,  respec¬ 
tively.  The  resulting  prediction  of  the  number  of  unfound  errors  (the  mean 
of  the  predictive  distribution)  can  then  be  expressed  as  a  weighted  sum  of 
products  of  linear  "credibility"  predictions  for  A  and  the  p^  .  Sur¬ 
prisingly,  the  predictive  density  can  be  calculated  exactly  through  a  re¬ 
cursive  relationship  which  shows  that  the  density  is  negative  binomial  in 
the  tails.  In  the  limit,  as  the  prior  variances  of  A  and  the  p^^  in¬ 
crease  without  bound,  the  predictive  mode  approaches  the  Chapman-Darroch 
estimator;  if  we  have  strong  prior  information,  the  mode  is  given  by  a 
generalized  Chapman-Darroch  form  involving  credibility  formulae. 


BAYESIAN  ESTIMATION  OF  UNDETECTED  ERRORS 
by 

William  S.  Jewell 

1.  INTRODUCTION 

A  number  of  estimation  problems  in  reliability  can  be  described  as  fol¬ 
lows:  a  certain  product  has  an  unknown  number,  N  ,  of  defects.  A  group  of 
I  inspectors  each  allocates  a  given  amount  of  independent  effort  to  finding 
and  removing  the  defects.  After  finding,  say,  n^  total  defects,  what  is 
the  estimated  number,  nQ  *  N  -  nj,  ,  of  undetected  defects  still  left  in  the 
product? 

For  example,  in  manufacturing  quality  control,  the  product  may  be  a  cer¬ 
tain  production  lot  for  which  the  inspectors  may  use  visual  or  machine-aided 
techniques  to  inspect  a  portion  or  all  of  the  items.  Estimation  of  the  num¬ 
ber  of  undiscovered  defects  in  the  sample  scrutinized  is  the  first  step  in 
setting  quality  assurance  levels  for  the  entire  lot. 

In  software  reliability,  the  defects  correspond  to  program  errors  or 
bugs  that  can  be  detected  and  removed  by  programmers  using  some  combination 
of  visual  scanning  of  program  code  and  of  experimental  running  of  the  pro¬ 
gram  on  typical  input.  The  estimation  of  undetected  errors  remaining  in  the 
program  not  only  helps  certify  the  application-readiness  of  the  software, 
but  also  provides  an  indication  of  the  effort  that  Kill  be  needed  for  cus¬ 
tomer  support  and  for  the  upgrading  of  future  program  releases.  A  similar 
Interpretation  arises  in  the  proofreading  of  manuscripts  for  misprints. 

Superficially,  this  model  is  similar  to  the  problem  of  estimating  the 
ultimate  failure  rate  of  a  product  during  the  reliability  growth  (learning 
curve)  phase  of  product  testing  and  development  (see,  e.g.,  Jewell  (1982)). 


However,  in  that  application,  an  unspecified  external  process  of  design  im¬ 
provement  reduces  the  stochastic  rate  of  recurrence  of  product  "failures" 
according  to  some  given  law,  whose  parameters  are  to  be  estimated.  In  this 
model,  on  the  other  hand,  an  inspector  is  assumed  to  actually  remove  (or  at 
least  to  identify)  one  of  a  finite  number  of  defects  or  errors,  so  that,  at 
the  end  of  inspection,  there  remain  only  a  smaller  number  of  unfound  errors. 
Further,  as  we  shall  see  below,  there  is  an  advantage  to  having  the  inspec¬ 
tors  work  in  parallel  on  the  same  product,  rather  than  in  series,  as  this 
helps  make  more  precise  any  uncertainty  in  the  inspection  efficiencies  of 
the  different  examiners,  and  thus  improves  the  estimate  of  undetected  errors 

After  specifying  the  basic  model,  we  find  first  a  simple  point  estima¬ 
tor  for  N  that  was  originally  developed  in  the  field  of  population  biology 
(by  Petersen,  Chapman,  Darroch,  and  others)  for  estimating  the  size  of  a 
closed  animal  population  through  capture- recapture  sampling.  We  then  make 
the  additional  assumption  that  N  is  Poisson  with  parameter  X  ,  and  show 
that  the  MLE  for  X  has  the  same  form  as  the  classical  estimator  of  N  . 

We  then  analyze  the  problem  from  a  Bayesian  point  of  view  by  assuming 
that  X  and  the  detection  probabilities  for  each  inspector  are  random  quan¬ 
tities  with  Gamma-  and  Beta-prior  densities,  respectively.  After  computing 
the  rather  complex  posterior  densities  of  the  parameters,  we  then  find  a 
simpler  expression  for  the  predictive  density  of  nQ  in  recursive  form, 
showing  that  this  density  is  Negative  Binomial  in  the  tails.  Moments  of 
this  predictive  density  can  only  be  expressed  as  a  ratio  of  complex  weighted 
sums  of  products  of  linear  "credibility"  predictors  for  X  and  the  detec¬ 
tion  probabilities;  however,  the  posterior  mode  of  nQ  can  be  rearranged 
into  the  form  of  a  generalized  Petersen-Chapman-Darroch  estimator  using  cred 
ibility  formulae. 


The  paper  concludes  with  examples  of  numerical  calculations  of  the 
predictive  density  and  remarks  on  model  extensions. 

1  would  like  to  express  my  appreciation  to  Sheldon  Ross,  who  introduced 
me  to  this  problem  area  through  the  paper  of  Polya  (1976) ,  and  to  Dennis 
Lindley,  who  pointed  out  the  connection  with  capture-recapture  census  methods 
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2.  BASIC  MODEL;  SERIES  AND  PARALLEL  SEARCH  STRATEGIES 

Suppose  that  the  error  inspection  process  is  such  that: 

(a)  Each  error  present  has  the  same  probability  of  being  detected 
by  a  given  inspector; 

(b)  The  probability  that  inspector  i  will  find  any  given  error 
is  »  (1  *  1,2,  ....  I)  ,  independent  of  previous  errors 
found  by  i  or  by  any  other  inspector. 

The  simplest  possible  strategy  for  organizing  a  search  by  I  inspectors  is  a 
serial  one,  in  which:  inspector  //I  examines  the  raw  product  (which  has  an  un¬ 
known  number,  N  ,  of  errors),  and  finds  and  removes  n^  errors;  inspector 
#2  then  examines  the  product  (which  now  has  N  -  n^  errors) ,  finding  and  re¬ 
moving  n2  errors; . until  the  Ith  inspector  finds  and  removes  n^.  of  the 

N  -  (n^  +  +  ...  +  errors  remaining.  It  follows  from  the  assumptions 

above  that  each  of  the  unknown  ^  is  conditionally  Binomially  distributed 

with  parameters  (p^  ,  N  -  (n^  +  ^  +  •••  +  »  80  that  the  joint  condi¬ 
tional  density  of  the  I  pieces  of  data,  i.e.,  of  (n^n^  . . . ,  nj  |  N  ;  j>)  , 

where  £  *  (p^,P2*  •••»  Pj)  »  *8  easily  found.  The  total  number  of  detected 

errors  in  serial  search  is  n^,  ■  n^  +  ^  +  •  •  •  +  n^.  ,  so  the  number  of  un¬ 

detected  errors  is  nQ  ■  N  -  Oj,  .  More  importantly,  since  each  error,  if 
present,  is  missed  by  i  with  probability  qi  «  1  -  pi  (i  -  1,2,  . . . ,  I)  , 

the  total  overlook  probability  (probability  of  being  undetected  by  any  in- 

I 

spector)  for  every  error  is  Q  «  II  q.  ,  and  thus  the  conditional  density  of 

i-1 

undetected  errors,  (n  |  N  ;  p)  ,  is  Binomial(Q,N)  . 

A  parallel  search  strategy  is  more  complicated,  since  here  we  assume, 
either  that  the  inspectors  all  work  independently  on  identical  copies  of  the 


product, or  that  they  work  In  some  sequence  on  a  single  product,  (secretly) 
identifying,  but  not  removing,  the  defects  which  they  find.  With  this  strat¬ 
egy,  there  will  usually  be  duplication  in  the  defects  found  by  different  in¬ 
spectors,  and  the  lists  of  defects  reported  by  each  will  have  to  be  recon¬ 
ciled,  classifying  and  counting  the  errors  in  the  following  mutually  exclu¬ 
sive  and  collectively  exhaustive  categories: 

-  the  number  of  defects  found  only  by  inspector  i  ; 
n^  -  the  number  of  defects  found  jointly  only  by  i  and  j  (i  <  j)  ; 
n.  ,  -  the  number  of  defects  found  jointly  only  by  i  ,  j,  and  k  (i  <  j  <  k) 


n10,  _  -  the  number  of  defects  found  jointly  by  all  inspectors. 


Thus,  there  will  be  2  -  1  separate  pieces  of  observed  data: 


P  “  {(ni);(nij);(nijk);**»;r»i23  • 


Inspector  i  finds,  in  total: 


(2.1) 


„(1)  •  nt  +  |  ny  +  H  »1Jk  +  •••  +  "123... I 


defects,  and  the  total  number  of  distinct  defects  found  by  all  inspectors  is: 


(2.2)  |  n±  +  l  l  n±j  +  H  j 


H  '  L  L  L  ni-jif  +  •••  +  ni 23  I 

i<j  J  i<j<k  J 


I  »(«  "  1  I  I  “1:i  -  2  I  l  l  n 


.  -  (I  -  l)n 


i<j<k 


123. ..I 


The  joint  conditional  density  of  V  and  nQ  ■  N  -  n^,  is  derived  in  the 


next  section.  Note  that,  in  spite  of  the  additional  complexity  of  parallel 
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search,  it  again  follows  from  the  assumptions  that  the  total  overlook  prob¬ 
ability  for  each  error  is  Q  ,  and  hence  (nQ  |  N  ;  £)  is  again  Binomial (Q,N)  . 

Thus,  for  fixed  N  and  £  ,  the  density  of  undetected  errors  is  independent 

* 

of  the  search  strategy  . 

Why,  then,  would  one  be  interested  in  parallel  search?  The  answer  ]ies 
in  the  fact  that,  by  permitting  duplicate  errors  to  be  found,  we  gain  addi¬ 
tional  information  about  the  detection  probabilities  (p^)  ,  so  that  if  they 
are  unknown  quantities  at  the  beginning  of  inspection,  the  increased  data 
set  associated  with  parallel  search  will  provide  increased  precision  in  the 
posterior  densities  of  both  £  and  nQ  .  Henceforth,  we  shall  assume  that 
a  parallel  search  for  errors  has  been  made. 


* 

Of  course,  if  defect  removal  occupies  a  substantial  portion  of  the  inspection 
effort,  then  the  two  search  strategies  are  no  longer  comparable  in  the  sense 
described  above. 


3.  THE  PETERSEN-CHAPMAN-DARROCH  ESTIMATORS 


We  begin  by  deriving  some  classical  point  estimators  for  N  using 
heuristic  arguments.  For  I  *  2  ,  we  can  argue  as  in  Polya  (1976)  that,  if 
p^  were  known,  N  =  n(l)/p^  is  a  reasonable  point  estimate  of  the  unknown 
total  number  of  errors.  On  the  other  hand,  there  is  also  the  estimator 
Pl  *  n^2/n(2)  for  the  first  detection  probability,  since,  of  the  n(2) 
total  errors  found  by  the  second  inspector,  n^  were  also  found  by  the 
first.  Combining  these  two  estimates,  we  have: 


(3.1) 


aOM2i.„  +Vl  ;  j,  -aUi 

12  ^  n12  1  N 


(i  =  1,2) 


Note  that  this  argument  is  symmetric  with  respect  to  the  two  inspectors,  and 
that  both  singly-found  and  jointly-found  defects  are  important. 

With  I  >  2  ,  a  slightly  different  argument  is  needed.  Let  the  unknown 
N  be  decomposed  into  found  and  unfound  defects,  and  replace  the  latter  by 
its  mean  value  with  fixed  j>  : 


N-nT  +  no  =  nT  +  QN 


However,  the  unknown  miss  probabilities,  qi  ,  can  be  estimated  for  fixed  N 
by  1  -  (n(i)/N)  ,  so  that,  combining  the  two  estimates: 


(3.2) 


N  -  ru. 


+  N  n  ll  -  : 


For  1=2,  these  formulae  reduce  to  (3.1),  while  for  1=3,  they  require 
the  solution  of  a  quadratic  equation,  etc  .  A  variety  of  approximating  and 
iterative  procedures  are  available  for  (3.2);  see  Seber  (1982).  A  good  ini¬ 
tial  approximation  in  the  general  case  is: 


(3.4) 


l  l  n(i)n(j) 

AU _ 

U 


which  is  reminiscent  of  (3.1).  It  is  easy  to  show  that,  if  all  n(i)  are 
equal  to  each  other  and  to  n^,  ,  then  N  =  Oj,  5  otherwise,  (3.2)  has  a  unique 
finite  root  N  >  n^,  . 

In  spite  of  the  appearance  of  n^  in  (3.1)  and  (3.4),  it  should  be 
clear  from  (3.2)  that  only  the  I  +  1  pieces  of  information  in  the  reduced 
data  set,  V*  =  {(n(i));nT>  ,  are  needed  to  estimate  N  . 

(3.1)  has  a  long  history  in  the  statistical  literature;  it  was  appar¬ 
ently  first  used  by  LaPlace  in  1783  to  estimate  the  population  of  France. 

In  population  biology,  it  arises  in  the  capture-recapture  sampling  of  a 
fixed,  but  unknown  animal  population,  in  which  n(l)  animals  are  captured 
and  marked  in  some  distinctive  fashion,  and  then  released  to  mix  wi"h  the 
general  population.  At  some  later  time,  when  ideal  mixing  is  thought  to 
have  occurred,  a  second  sample  of  n(2)  animals  are  recaptured,  of  which 
n12  are  °bserved  to  be  already  marked.  N  then  estimates  the  total  animal 
census,  and  is  generally  called  the  (C.G.J.)  Petersen  method,  after  the 
Danish  fishery  biologist  who  used  it  to  study  plaice  populations  in  1889; 
however,  it  is  also  attributed  to  a  Norwegian,  K.  Dahl,  in  1917,  and  by 
ornithologists,  to  an  American,  F.C.  Lincoln,  who  calculated  waterfowl 
abundance  in  1930.  Further  details  may  be  found  in  Seber  (1982). 


The  case  I  >  2  corresponds  to  a  multiple  capture-recapture  sampling 
of  a  closed  population,  in  which  successive  catches  are  distinctively 
(re)marked  and  then  released,  in  what  is  called  a  Schnabel  census.  The  es¬ 
timator  (3.2)  was  first  obtained  by  Chapman  (1952),  thus  showing  that  the 
reduced  data  set  V*  is  sufficient  for  N  and  £  ;  in  animal  census  ter¬ 
minology,  this  means  that  the  complete  capture  history,  e.g.,  distinctive 
remarking,  is  not  needed  to  estimate  the  size  of  a  closed  population. 
Darroch  (1958)  then  clarified  the  derivation  of  (3.2)  and  analyzed  its  prop¬ 
erties.  Since  that  time,  there  has  been  an  explosion  of  generalizations  of 
this  approach  in  the  biometric  literature,  as  well  as  adaptations  to  epide¬ 
miology  and  other  fields;  again,  Seber  (1982)  provides  the  most  convenient 
summary.  By  far,  the  literature  uses  classical  estimation  techniques;  the 
Bayesian  literature  is  described  below  in  Section  9. 
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4.  NUMERICAL  BEHAVIOR  OF  THE  CLASSICAL  ESTIMATOR 

A  A 

To  obtain  some  idea  of  the  empirical  properties  of  N  (and  hence  of  A 
in  (5.5)),  simulations  of  the  error  detection  process  were  run  with  a  true 
value  of  N  *  100  ,  for  I  =  2,4,  and  8  inspect  t3,  and  with  a  range  of 
common  detection  probabilities,  p^^  =  p  *  .05,  (.05)  .30, (.10)  .90  .  100 

samples  provided  sufficient  stability  for  large  p  ,  but  200  samples  were 
needed  for  smaller  values  of  p  ,  as  often  the  estimator  did  not  exist  for 
small  p  because  no  overlap  in  detection  occurred. 

The  results  are  summarized  in  Figures  1,2,  and  3.  For  very  small  values 
of  p  ,  N  is  badly  underbiased,  then  swings  briefly  to  overbiased  values  be¬ 
fore  settling  down  to  the  true  value  as  p  approaches  unity.  This  effect 
occurs  at  lower  values  of  p  ,  and  is  reduced  in  magnitude,  by  increasing 
I  .  However,  looking  at  the  quantiles,  we  see  that  the  distribution  of  pos¬ 
sible  values  of  N  is  very  unstable,  and  probably  unacceptable,  for  low 
values  of  p  . 


FIGURE  2. 


Empirical  Behavior  of  Classical  Estimator  N  versus  Common 
Detection  Probability  p  ,  with  I  ■  4  Inspectors.  N 


(which  is  expected  from  first  principles),  and  the  data  likelihood  is: 


(5.4) 


p(D*  |  A  ,  2) 


xVx<1-«) 


I 

n 

i*=l 


n(i)  Vn(i) 


In  other  words,  the  reduced  data  set,  V*  =  { (n(i) )  ;n,p}  is  sufficient  for 
both  A  and  £  . 

The  maximum- likelihood  estimates  of  the  parameters  are  now: 


(5.5) 


A  *  n_  +  A  n 
i=l 


(5.6) 


n(i) 

A 


(i  =  1,2,  ....  I) 


which  can  be  compared  with  (3.2) (3.3).  In  other  words,  the  MLE  of  A  is 
exactly  the  Petersen-Chapman-Darroch  estimator  for  N  . 


6.  A  BAYESIAN  MODEL 


The  model  just  developed  Is  unsatisfactory  In  most  applications  because 
(5.3)  depends  upon  X  and  £  being  known  exactly,  thus  giving,  for  example, 
E{nQ  |  X  ,  j>}  =  XQ  .  Usually,  these  parameters  will  not  be  known  precisely, 
and  so  we  will  henceforth  assume  that  these  are  random  quantities,  with  given 
prior  distributions.  In  this  way,  the  search  for  errors  will  also  provide  us 
with  updated  estimates  of  the  rate  of  error  occurrence  for  this  particular 
product  and  for  the  current  inspector  performance  parameters . 

Because  of  the  complexity  of  our  final  results,  even  with  simple  priors, 
we  begin  first  with  cases  in  which  either  X  or  £  are  known,  a  priori. 

This  permits  us  to  review  known  results  on  appropriate  natural  conjugate 
priors,  and  to  suggest  methods  for  estimating  hyperparameters .  For  simpli¬ 
city,  hyperparameters  are  omitted  as  explicit  arguments,  except  in  priors. 

One  special  notation  is  convenient  in  the  sequel.  If  the  predictive 
mean  of  some  random  variable  y  is  a  linear  function  of  a  "natural  estimator 
y  “  y0>y)  formed  from  the  data,  0y  ,  then  we  refer  to  the  formula  for  the 
predictive  mean  as  a  "credibility  estimator",  because  it  generally  has  the 
form: 

(6.1a)  E{y  |  IM  =  (1  -  Z)E{y}  +  Zy(Py)  d=f  f~(y;n,v)  , 

where 

(6.1b)  Z  =  Z(n,v)  ■  — ~ — 

is  the  "credibility  factor"  which  mixes  the  prior  mean,  E{y}  ,  and  the  nat¬ 
ural  estimator,  y(Dy)  .  n  is  the  "equivalent  number  of  samples"  in  the 
data,  and  v  is  the  "credibility  time  constant".  This  terminology  is  from 


the  field  of  actuarial  science,  but  formulae  of  this  type  occur  repeatedly 
in  Bayesian  prediction  or  in  least-squared  approximations  to  predictive  means. 

A  complete  bibliography  of  credibility  theory  through  1981  is  promised  as  a 
forthcoming  special  issue  of  Insurance  Abstracts  and  Reviews. 

6.1  Random  Error  Occurrence  Rate,  Fixed  Detection  Probabilities 

6.1.1  A  Simplified  Experiment 

Consider  first  a  simplified  experiment  in  which  an  integer-valued  random 
variable,  n  ,  is  Poisson-distributed,  with  the  mean  rate,  X  ,  now  considered 
to  be  a  random  quantity.  A  convenient  prior  on  X  is  the  Gamma(a,b)  den¬ 
sity: 

,  a  a-1  -bX 

(6.2)  p(X  1  a,b)  -  r(af -  <X>0). 

The  hyperparameters  (a,b)  can  be  selected  by  estimating  the  first  two  prior 
moments  of  the  error  occurrence  rate,  since  E{X}  -  a/b  ,  and  1/{X}  -  E{X}/b  . 
(Note  that  randomizing  on  X  is  tantamount  to  saying  that,  a  priori,  n  is 
Negative  Binomial (a  ,  (b  +  1)_1)  .) 

(6.2)  is  advantageous  because  the  Gamma  family  is  "closed  under  sampling", 
that  is,  if  the  outcome  of  a  single  experiment  is  n  *  ,  then  the  posterior- 

to-data  density,  p(X  j  n^)  is  just  Gamma(a  +  nT  ,  b  +  1)  ,  i.e.,  has  the 
same  form  as  (6.2),  but  with  updated  parameters.  This  simplicity  also  extends 
to  the  posterior-to-data  predictive  mean  of  X  ,  which  is  of  credibility  form: 

£{X  |  rtj}  -  f_(rtj,;l,b)  . 

A 
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If  a  and  b  are  varied,  keeping  the  prior  mean,  E{A}  *  a/b  constant,  we 
see  that  the  time  constant,  v  ■  b  ,  shifts  the  credibility  weight,  Z  * 

(1  +  b)  ^  ,  to  be  attached  to  the  outcome  of  the  experiment  as  a  "credible" 
measure  of  A  .  (In  this  sense,  the  credibility  notation  hides  the  fact  that 
E{A  |  n^,}  depends  on  both  a  and  b.) 

6.1.2  Undetected  Error  Likelihood  and  Posterior  Parameter  Density 

For  our  undetected  error  model,  we  use  the  more  complex  likelihood  in 
(5.4),  keeping  j>  fixed,  which  becomes: 

(6.3)  p(nT  |  A  ,  j>)  «  A  Te  . 

This  modifies  slightly  the  results  of  the  last  section,  and  we  find  that  the 
posterior-to-data  density  of  A  is  Gamma(a  +  ,  b  +  1  -  Q)  .  The  mean 

posterior  value  of  the  parameter  is: 

(b.4)  E{ A  |  V*  ,  £>  -  E{A  |  n^,  ,  Q)  -  f^°r  !  ^1  -  Q)  ,  b)  , 

so  that,  relative  to  our  simplified  experiment,  the  time  constant  is  unchanged, 
but  the  equivalent  number  of  samples  is  reduced.  Since  Z  depends  upon  the 
ratio  (n/v)  ,  somewhat  less  credibility  is  attached  to  the  observation  in  this 
model. 

6.1.3  Prediction  of  Undetected  Errors 

However,  the  posterior  density  of  A  is  only  an  intermediate  step  to 
the  result  of  interest,  namely,  determining  the  predictive  density  of  nQ  . 
Since  p(n  |  A  ,  £)  is  Poisson(AQ)  (5.3),  the  marginal  (prior)  density 


(6.5)  p(nQ  |  Q) 


> 
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r(a  +  V 

r(a)nQ!  \b  +  Q/Vb  +  Q/ 


(nQ  *  0,1,2,  »«•) 


that  is.  Negative  Binomial (a  ,  Q/(b  +  Q))  .  Thus,  before  detecting  errors, 
our  opinion  about  the  errors  that  will  be  undetected  after  the  experiment  is 
that  E{nQ  |  Q)  *  aQ/b  ,  and  l/{nQ  j  Q}  *  (aQ/b)  (1  +  (Q/b) )  . 

From  the  updating  found  in  Section  6,1.2,  it  follows  that,  after  the 
detection  experiment  is  over,  we  will  predict  that  the  density  of  nQ  , 
p(nQ  |  V*  ,  j>)  ■  P(nD  |  hj  ,  Q)  is  Negative  Binomial  with  updated  parameters 
(a  +  n_  ,  Q/(b  +  1))  .  For  future  reference,  this  predictive  density  satis¬ 
fies  the  recursion: 


p(n<> 1 1  \  ,  q>  ■  p(»0 1  -r .  <»(r~Vr)(-  il  1  -°)  • 


Posterior-to-data,  the  predicted  mean  number  of  defects  not  yet  found  is  then: 


(6.6)  E(nQ  |  V*  ,  £}  **  E{nQ  |  ^  ,  Q} 


(1  -  Q) 


As  in  (6. A),  the  hyperparameters  enter  as  the  ratio  (a/b)  in  determining 
E{nQ)  ,  and  b  becomes  the  credibility  time  constant  in  a  credibility  for¬ 
mula  with  1  -  Q  equivalent  samples .  Thus ,  if  a  and  b  ,  the  parameters  of 
the  prior,  are  large  (resp.,  close  to  0),  then  the  natural  estimator,  nQ  * 
Qn^/(1  -  Q)  ,  is  weakly  (resp.,  strongly)  weighted  in  the  prediction,  relative 
to  the  prior  opinion,  E{nQ}  •  This  is  exactly  what  we  would  expect  in  com¬ 
paring  the  results  obtained  with  strong  or  weak  prior  opinion.  Finally,  note 
that  when  £  is  fixed,  only  Q  is  in  fact  used,  and  only  n^,  from  the  data 

is  sufficient  for  X  and  n  . 

o 


.’•'.V.V.VV  v.v.v.v.v  V.V.V?. 


*.  *v  . 
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6.2  Random  Detection  Probabilities,  Fixed  Error  Occurrence  Rate 
6.2.1  A  Simplified  Experiment 

In  the  opposite  situation,  where  X  is  known,  but  the  (p^)  are  joint¬ 
ly  random,  it  is  instructive  to  first  consider  a  simple  one-dimensional  ex¬ 
periment  in  which  an  integer-valued  random  variable,  n  =  0,1,2,  ...,  M  ,  is 
Binomial(p,M)  ,  with  a  fixed  number  of  trials,  M  ,  but  with  the  success 
probability,  p  ,  considered  as  a  random  quantity.  The  convenient  natural 
conjugate  prior  is  the  Beta(a,6)  density: 

(6.7)  p(p  |  a,g)  -  B  ^(a.B)?01  1q^  ^  ,  (0£p*l-q<_l) 

where  B  is  the  Beta  function,  B(a,$)  *  r(a)r(8)/r(a  +  6)  .  Henceforth, 
we  abbreviate  a  +  6  *  y  .  The  hyperparameters  (a, g)  can  be  selected  by 
estimating  the  first  two  moments  of  the  detection  probability,  since  E{p}  ■ 

1  -  E{q}  -  a/y  ,  and  V{p}  *  V{q}  ■  E{p}E{q}(y  +  1)  1  . 

Because  (6.7)  is  closed  under  sampling  relative  to  the  Binomial  like¬ 
lihood,  if  the  outcome  of  this  simplified  experiment  gives  n  *  nc  successes 

u 

(and  hence  M  -  ng  failures) ,  then  the  posterior-to-data  density  of  p  is 

again  Beta,  but  with  modified  hyperparameters  (a  +  nc  ,  6  +  M  -  nc)  .  The 

b  b 

posterior-to-data  mean  predictor  of  the  detection  probability  is  also  in 
credibility  form: 

(6.8)  E{p  |  ng}  -  1  -  E{q  |  ng}  -  fp((lf)  »  M  *  • 

The  hyperparameter  ratio  (a/y)  determines  the  prior  mean  E{p}  ,  but  in 
this  case  it  is  y  ■  a  +  6  which  becomes  the  effective  credibility  time  con¬ 
stant.  Naturally,  there  are  M  effective  trials,  with  a  natural  estimator 
from  the  data  of  p  -  1  -  q  -  n,,/M  . 


6.2.2  Joint  Detection  Probability  Prior  and  Posterior  Densities 


In  our  I-dimensional  error  detection  model,  it  is  natural  to  assume  that, 

a  priori,  the  (p^)  are  independently  distributed  as  in  (6.7),  but  with  pos- 

I 

sibly  different  hyperparameters,  so  that  p(p  |  £,j5)  is  n  Beta(a  ,B.)  . 

i=l 

(With  a  slight  increase  in  complexity,  one  could  also  start  with  Dirichlet- 
distributed  jj.) 

The  appropriate  part  of  the  data  likelihood  (5.4),  with  X  fixed,  be¬ 
comes  : 


(6.9) 


p(P*  |  X 


£) 


«  e-X(1‘Q) 


I 

n 

i«l 


n(l)  Vn<1) 

Pi  q± 


and  we  see  that  the  first  term  prevents  finding  a  simple  natural  conjugate 
prior,  and  introduces  a  rather  complex  coupling  between  the  (p^)  through 
Q  a  II  (jj  .  However,  if  X  were  very  small,  so  that  Op  were  also  small 
and  hence  there  would  be  no  jointly  found  errors,  then  the  first  term  would 
be  approximately  unity,  and  the  application  of  Bayes'  law  would  update  each 
of  the  hyperparameter  groups  independently  to  (a^  +  n(i)  ,  + 

n^,  -  n(i))  in  a  manner  similar  to  the  last  subsection,  with  the  reasonable 
interpretation  that  n(i)  is  the  number  of  "successes"  for  inspector  i 
out  of  nT  total  "trials". 

For  the  general  case  in  which  X  is  of  arbitrary  size,  we  expand  e+X(^ 
in  an  infinite  series,  and  find,  after  some  algebra,  the  (normalized)  posterior- 
to-data  joint  density  of  the  detection  parameters  as: 


(6.10a)  p(£  |  V*  ,  X) 


I 

n 

-I  — 1 


Betai(ai  +  n(i)  ,  3i  +  -  n(i)  +  j) 


with 


(6.10b)  c 


c.(P*,x)  =  jT 


I  r(e±  +  nT  -  n(i)  +  j)  +  nT) 

i^1  r (6,^  +  Op  -  n(i) )  r(Yi  +  Hj,  +  j) 


Beta^  ,  of  course,  refers  to  the  usual  one-dimensional  Beta  density  for  . 
Marginally,  one  can  find: 


(6.11)  p(Pi|P*,A) 


00 


r  1 


co 


Betai(a1  +  n(i)  ,  Bi  +  rtp  -  n(i)  +  j) 
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but  this  is  misleading,  as  the  (p^)  are  now  dependent  random  variables.  In 
general,  then,  the  posterior  parameter  density  is  a  data-weighted  combination 
of  a  sequence  of  simpler  experiments,  j  =  0,1,2,  ...,  in  which  inspector  i 
has  n(i)  "successes"  out  of  +  j  "trials". 

The  predictive  mean  of  p^  can  be  expressed  as  a  weighted  sum  of  cred¬ 
ibility  forms  similar  to  (6.8);  for  later  use,  we  record  the  mean  predictor 
for  the  overall  miss  probability: 


(6.12)  E{Q  |  V*  ,  X}  = 


l 

k-0 


-1 


l 

j-o 


I 

n 

i=l 


,yi) 


which  has  obvious  credibility  interpretations,  in  light  of  the  above  remarks 
about  "trials"  and  "successes". 

6.2.3  Prediction  of  Undetected  Errors 

Since  p(nQ  |  X  ,  j>)  is  Poisson(XQ)  ,  one  uses  the  same  trick  as  above 
to  determine  the  marginal  (prior)  density  of  nQ  : 
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(6.13)  p(nQ  |  A) 


I  r(ei  +  nQ  +  j)  r(Y±) 

1-1  *  °o  +  5>  ' 


By  previous  results,  our  prior  opinion  about  the  mean  outcome  must  be 
E{nQ  |  A}  =  AE{Q>  =  A  U  (6^)  . 

Surprisingly,  the  posterior-to-data  predictive  density  of  nQ  is  simpler 
than  (6.10)  or  (6.13),  as  there  is  a  fortuitous  cancellation  of  the  term  e+*^ 
in  (6.9)  with  the  e  ^  of  the  Poisson  density.  After  some  algebra,  we  find: 


(6.14)  p(no|P*,A) 


p(0  |  V*  ,  A) 


I 

n 

i=l 


r(6i  +  iij,  -  n(i) +nQ)  r(y1  +  nT) 
r(gjL+nT-n(i))  r(yi  +  nT  +  no) 


which  can  be  put  into  recursive  form,  suitable  for  computation,  as: 


(6.15) 


p(nQ  +  1  |  V*  ,  A) 
p(nQ  |  V*  ,  “A) 


nT  +  no 


(In  practice,  one  sets  p(0  |  V*  ,  A)  to  unity,  computes  successive  prob¬ 
abilities  until  they  become  negligible,  and  then  renormalizes.  Because  of 
the  speed  of  this  method,  it  appears  best  to  compute  the  moments  of  nQ 
numerically,  rather  than  using,  say,  E{nQ  |  V*  ,  A}  ■  AE{Q  |  V*  ,  A}  and 
(6.12).) 

Note  the  reappearance  of  the  mean  credibility  predictors  for  the  (q^  , 

this  time  with  n(i)  "successes"  out  of  n^  +  nQ  "trials"  for  inspector  i  . 

Because  the  f.  approach  unity  as  n  -*■  00  ,  the  density  (6.15)  will  have  a 
a  o 


Poisson  tail,  with  parameter  A  . 


6.3.1  Joint  Parameter  Prior  and  Posterior  Densities 

With  these  preliminary  formulae  and  interpretations  over,  we  can  move 

quickly  through  the  general  case  in  which  both  X  and  the  (p^)  are  random 

quantities.  For  simplicity,  we  combine  the  previous  priors  in  an  independent 

I 

manner,  so  that  p(A  ;  £  |  a,b  ;  oi,3)  is  Gamma(a,b)  n  Beta.  (ct.  ,B .)  . 

i=i 

The  full  form  of  the  likelihood  (5. A)  must  now  be  used.  We  note  that, 
except  for  the  term  e+^  ,  we  would  have  independent  updating  of  each  com¬ 
ponent  of  the  prior  according  to: 

(6.16)  a' -a  +  xtj  ;  b'«=b  +  l  ;  a^  =  ai  +  n(i)  ;  *  g±  +  n^,  -  n(i)  ;  +  . 

But  the  coupling  term  can  be  expanded  into  a  power  series,  as  in  previous 
subsections,  so  the  posterior-to-data  joint  parameter  density  becomes,  after 
normalization: 


(6.17a)  p(A  ,  £  |  V*) 


l  dw  I  Gamma(a’  +  j  ,  b') 

k=0  KJ  j=0  2 


I 

n  Beta  (a'  ,  6'  +  j) 
i=l  111 


with 


(6.17b)  dj  -  djO?*) 


(b')~J  r(a'  +  1)  J  r(6i  *  r(YP 
j!  r(a’)  ^  r(gp  r(Y;  +  j)  • 


In  general,  both  X  and  the  (p^)  are  correlated,  a  posteriori.  Moments  of 
the  parameters  can  now  be  obtained  as  in  (6.12),  provided  one  can  compute  the 
coefficients  (d.(0*))  (see  below) . 
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6.3.2  Prediction  of  Undetected  Errors 

Using  (5.3),  one  finds  the  marginal  density  of  no  ,  prior  to  inspecting 
for  errors,  to  be  a  rather  complex  combination  of  (6.5)  and  (6.13): 


r(a  +  no)  -no  S  (-b)-J  r(a+no  +  j)  1  r(S1  +  no-hj) 


(6-18>  --mrr b J  - ^ —  " 


Hy±) 


j-0 


r(a) 


i=l 


r(6i) 


r(Bi  +  no  +  j) 


Of  course,  a  priori  E{no)  ■  E { XQ}  =  E{ X }E{Q}  =  (a/b)  II  (B^y^)  • 

Again,  we  are  surprised  to  find  that  the  complexity  of  (6.17)  and  (6.18) 
are  not  carried  over  into  the  predictive  density  of  nQ  ,  because  of  the 
fortuitous  cancellation  of  two  exponential  terms.  After  some  algebra,  we 
find: 


(6.19)  p(n  |  V*)  =  p(0  |  V*) 


/a,')'"0  r(a>  +  V\  5 

V  no!  hh 


r(Bi  +  no) 
r(B|) 


r(Yp 


r(Y;  +  no) 


which  should  be  compared  with  (6.14)  and  the  updated  version  of  (6.5).  In 
fact,  from  (6.17),  we  see  that: 


dn  (D*) 

p(nQ  |  V*)  -  - - 

l  d  (V*) 
j-0  J 


which  gives  a  logical  interpretation  to  the  weights  in  that  formula.  Simi¬ 
larly,  E{nQ  I  could  also  be  expressed  as  the  ratio  of  two  weighted  sums 

of  products  of  linear  credibility  formulae. 

As  in  (6.15),  the  predictive  density  can  be  put  into  recurrence  form  as: 


1  ,  b) 


I 

n 


Numerical  computation  is  very  efficient;  by  setting  p(0  |  V*)  -  1  ,  one  in 

fact  computes  the  coefficients  d  (V*)  ,  and  then  gets  the  predictive  den- 

o 

sity  through  normalization.  Moments  of  nQ  are  thus  best  found  numerically. 
As  nQ  -*•“>,  f.  1  for  every  i  ,  so  that  p(nQ  I  P*)  has  a  Negative 

Binomial^a  +  n^,  ,  (b  +  1)  tail,  similar  to  the  predictive  density  with  £ 
fixed  in  Subsection  6.1.3. 

Of  course,  the  various  special  results  of  Subsections  6.1  and  6.2  can 


now  be  gotten  from  the  formulae  above  through  appropriate  limiting  values  of 
the  hyperparameters. 
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7.  THE  POSTERIOR  MODE 


Although  (6.20)  permits  the  calculation  of  any  moment  of  nQ  ,  it  Is 
difficult  to  compare  these  Bayesian  results  with  the  classical  point  esti¬ 
mators  of  Section  3. 

However,  the  mode  of  the  predictive  density  is  easily  found  as  the 

smallest  integer,  nQ  ,  for  which  p(nQ  +  1  |  V*)  <_  p(nQ  |  P*)  .  From  (6.20), 

after  some  rearranging,  we  find  that  nQ  is  the  smallest  integer  not  less 

than  the  solution  n*  to: 

o 


(7.1)  n*  +  1  = 
o 


f  /  b  \/  -  \]  1  n(i)  +  y  E{p  } 

ino  +  nT  +  (brr)(EU}  -  no‘  "l)  "  1  ~  ~n~^rT7T  ’ 

I“1  Oil 


which  should  be  compared  with  (3.2),  rewritten  with  nQ  ■  N  -  nT  : 


(7.2) 


n  -  [n  +  nj  II  [l  -  * 

°  °  ^  i*=l  l  no  nT. 


whence  we  can  easily  see  the  effect  of  adding  prior  opinion. 

If  b  -*■  0  and  y^  -►  0  ,  with  constant  prior  means  E{X)  and  E{p^}  , 
n*  approaches  nQ  ,  so  that  this  would  correspond  to  "diffuse"  prior  know¬ 
ledge  (although,  for  the  Beta  density,  a  ■  6  «  1  and  y  ■  2  is  usually 
considered  the  diffuse  case).  Conversely,  as  b  -*•  °°  ,  with  constant  E{X)  , 
the  mode  approaches  the  integer  above  the  solution  to: 


(7.3) 


n*  +  1 
o 


Eli)  ii  fq  ((X  '  ;  "S  +  "I  S  Yi)  • 


or,  if  all  the  y^  ■*  00  ,  with  E{p^}  fixed,  the  mode  is  the  integer  above 
the  solution  to: 
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(7.4) 


%* - 1  -  \  b  -;t-Tq} 


which  is  practically  the  posterior  mean  (6.6).  Thus,  the  posterior-to-data 
mode  of  the  predictive  density  is  intimately  related  to,  and  a  natural 
generalization  of,  the  Petersen-Chapman-Darroch  estimators. 


8.  OTHER  MODELS 


A  variety  of  related  error-detection  models  can  be  developed  using  the  j 

above  methods.  For  example,  in  animal  census  studies,  it  is  often  assumed 
that  the  capture  probability  remains  constant  at  each  trial;  this  is  the 
same  as  assuming  that  the  individual  detection  probabilities  (p^)  are  equal 
to  some  oormon  unknown  value,  p  .  For  this  case,  one  can  easily  show  that 
the  two  statistics,  V**  *  jn^  ,  n  ■  I  *  £  n(i)j  ,  are  sufficient,  giving 
estimators: 


(8.1) 


If  we  assume  a  Beta(a,0)  prior  for  p  ,  X  remaining  Gamma(a,b)  a  priori, 
we  find  that,  corresponding  to  (6.20),  the  predictive  density  for  undetected 
errors  satisfies  the  recursion: 


p(no  +  1  | 

V**) 

(.  ♦  ^  +  n  v  1-1  , 

6  +  I(nT  -  n  +  nQ)  + 

p(nQ  j  V**)  1 

^  b+1  JjV 

Y  +  Kiij.  +  nQ)  +  J  / 

The  convergence  of  the  estimate  Q  with  increasing  1  is  quite  rapid  be¬ 
cause  of  the  increased  rate  of  learning  about  p  in  this  model. 

A  related  variation  occurs  when  the  inspectors  have  a  common  "unit" 
detection  probability,  but  expend  different  known  amounts  of  effort  or  search 
duration  (e^)  ;  this  is  tantamount  to  assuming  p^  *  e^p  (i  -  1,2,  ...,  I)  . 

One  can  also  assume  that  the  error  detection  or  correction  process  is 
defective,  or  that  new  errors  can  enter  randomly  during  inspection;  this  leads 
to  likelihoods  related  to  those  already  analyzed  for  non-closed  animal  popula¬ 
tion  studies  (Seber  (1982)).  Or,  one  can  assume  that  detection  probabilities 
are  different  for  different  error  types  (Otis  et  al.,  (1978)).  And  so  forth. 


Finally,  one  could  also  make  a  Bayesian  analysis  of  the  serial  inspec¬ 
tion  strategy;  however,  as  explained  earlier,  we  expect  this  to  be  less  ef¬ 
ficient  at  predicting  unfound  errors  because  less  information  about  the 
unknown  detection  probabilities  is  generated.  A  comparison  between  these 
two  approaches  will  be  the  subject  of  a  forthcoming  paper. 
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9.  OTHER  BAYESIAN  MODELS 

Apart  from  an  elementary  model  for  1-2  by  Gaskell  and  George  (1972), 
a  partly  Bayesian  approach  by  Carle  and  Strub  (1978) ,  a  sequential  sampling 
plan  for  1-1  by  Yang  et  al.,  (1982),  and  a  sequential  sampling  plan  for 
1-2  by  Freeman  (1973),  the  only  general  Bayesian  model  of  which  the  author 
is  aware  is  by  Casteldine  (1981).  Starting  with  the  likelihood  (3.2),  he 
assumes: 


either  that  I:  all  p^  -  p  ,  which  is  Beta (a, 8)  , 

or  that  II:  each  i.l.d.  p^  is  Beta (a, 8)  , 

and  that  N  has  an  arbitrary  independent  prior,  n(N)  .  From  this  point  on, 
his  argument  is  mostly  numerical  or  approximative  in  nature,  concentrating  on 
n(N)  constant  or  II  (N)  “  N  *  .  Some  other  more  complex  variations  are  also 

-  2 

explored,  for  example,  a  two-stage  model  in  which  (In  oi^)  is  Normal(0,o  )  , 

2 

o  known,  and  0  is  also  normally  distributed  with  known  hyperparameters. 
However,  additional  approximations  appear  necessary  to  interpret  these  varia¬ 
tions. 

This  is  in  contrast  to  our  results,  which  require  N  ~  Poisson (X)  and 
X  ~  Gamma(a,b)  ,  which  is  tantamount  to  assuming  N  is  Negative  Binomial,  a 
priori.  While  this  assumption  may  be  of  limited  validity  in  animal  popula¬ 
tion  studies,  it  seems  like  a  useful  starting  point  for  reliability  modelling, 
at  least  until  empirical  error  and  defects  distributions  are  available  (Yang 
et  al.,  (1982)  argue  a  Gamma-Poisson  assumption  in  proofreading  manuscripts). 
Our  predictive  densities  also  have  the  advantage  that  they  can  be  expressed 
in  closed  form,  with  "credibility"  interpretations  for  many  of  the  components, 
and  the  posterior  mode  can  be  related  to  the  classical  Petersen-Chapman-Darroch 
formula. 


Other  Bayesian  variations  will,  no  doubt,  also  prove  useful  in  application. 


10.  NUMERICAL  BEHAVIOR  OF  THE  BAYESIAN  ESTIMATOR 

To  obtain  some  idea  of  the  numerical  properties  of  (6.19),  simulations 
were  run  using  various  priors,  and  various  values  of  I  . 

For  the  detection  probabilities,  it  was  assumed  that  for  the  Beta  priors, 
*  6^  •  1.0  ,  which  gives  uniform  densities  for  all  i  .  Three  cases  of 
error  rate  prior  were  examined: 

£{X>  t/{A> 

I  50  1250 

II  100  5000 

III  200  20000 

The  shape  parameter  a  of  the  Gamma  prior  was  kept  constant  at  a  ■  2  ,  with 
b  adjusted  to  give  the  above  moments.  Since  ^true  was  100,  it  can  be 
seen  that  these  correspond  to  low,  O.K. ,  and  high  prior  estimates,  though  of 
course  N  ■  100  could  have  occurred  from  any  prior. 

Then,  one  sample  of  data  was  obtained  for  I  -  1,2,4,  and  8,  with  as¬ 
sumed  values  p^  “  0.5  for  all  i  .  The  data  sets  obtained  were: 


I  - 

1 

itj,  -  45 

n  - 

(45) 

I  - 

2 

Oj  -  79 

n  ■ 

(55,47) 

I  - 

4 

Ht-95 

n  ■ 

(48,52,57,45) 

I  - 

8 

a,,  -  99 

n  - 

(50,55,42,47,50,44,51,50)  . 

Of  course,  the  results  would  have  been  quite  different  in  another  simulation. 

A 

The  classical  estimator,  N  ,  does  not  exist  for  I  -  1  ;  but  would  have 
given  values  of  112.39,  101.31,  and  99.45,  that  is,  nQ  -  33.39,  6.31,  and 
0.45  for  I  ■  2,4,8  respectively. 

Figure  4  shows  the  density  p(nQ  |  V)  for  I  ■  1  ,  for  the  three  priors 


given  above;  the  effect  of  the  priors  on  the  predictive  mean,  though  not  on 


the  shape  can  be  clearly  seen.  Figure  5  shows  that  the  predictive  density 
develops  an  interior  mode  when  I  *  2  ,  although  the  difference  due  to  dif¬ 
ferent  priors  is  less  perceptible.  For  I  ■  4  and  8,  the  effect  of  the 
priors  is  barely  perceptible,  so  that  Figure  6  shows  just  case  II  above;  for 
I  ■  8  ,  the  mode  is  again  at  n  m  0  . 


FIGURE  5.  Predictive  Density  for  one  sample  from  two  observers,  three  different  priors 
(Continuous  curve  approximates  discrete  density) 
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